Medical treatment planning via sequential games

ABSTRACT

A method and system for identifying a treatment plan identifies a description of a sequential game. The game is associated with treatment of a medical condition by a course of treatment or drug design. The description may include one or more possible treatment actions that a treater can take to treat the medical condition, and one or more possible medical condition actions that the medical condition can take. The system may develop a model for the sequential game, wherein the model represents implementation of the possible treatment actions and the possible medical condition actions in one or more sequences. The system may solve the model to generate a treatment plan for the medical condition, wherein the treatment plan includes a set of possible treatment actions.

RELATED APPLICATIONS AND CLAIM OF PRIORITY

This application claims priority to U.S. Provisional Patent ApplicationNo. 61/677,477, titled “Drug Design and Treatment Planning viaSequential Games,” filed Jul. 31, 2012. The disclosure of the priorityapplication is hereby incorporated by reference in its entirety.

BACKGROUND

When a medical professional such as a doctor, therapist or other medicalprofessional (each of which may be referred to as a “doctor” in thisdocument) meets with a patient, the doctor may assess the patient andarrive at a diagnosis for a medical condition. To address the condition,the doctor will develop and implement a course of treatment, which mayinclude the administration of drugs, surgical procedures, additionaltests, physical or mental therapy, lifestyle changes such as diet oractivity restrictions, or other treatment elements.

Many medical conditions do not remain static over time. As the doctorand patient implement the course of treatment, the condition mayimprove, or it may become worse. In addition, additional medicalconditions may arise, or the condition may not remain static despite thefact that the treatment is expected to improve the condition.

Thus, improved methods of identifying and designing pharmaceutical orother courses of treatment are desirable. This document describesmethods and systems that are directed to solving some or all of theseissues.

SUMMARY

This document describes methods and systems that use sequential gamemodels, and algorithms for solving them, for drug design and/ortreatment planning, such as treatment of a patient over time. In some ofthe embodiments described below, the treating party may establish“traps” that seems promising for the adversary (e.g., a disease), suchthat as the disease evolves over time into a trap, the treating partymay be able to successfully attack the disease.

In one embodiment, the system includes one or more processors and anon-transitory, computer-readable memory comprising one or moreprogramming instructions that, when executed, cause one or more of theprocessors to implement a method of identifying a treatment plan. Thesystem may identify a description of a sequential game, wherein the gameis associated with treatment of a medical condition. The description mayinclude one or more possible treatment actions that a treater can taketo treat the medical condition, and one or more possible medicalcondition actions that the medical condition can take. The system maydevelop a model for the sequential game, wherein the model representsimplementation of the possible treatment actions and the possiblemedical condition actions in one or more sequences. The system may solvethe model to generate a treatment plan for the medical condition,wherein the treatment plan includes a set of possible treatment actions.

Optionally, when solving the model, the system may generate one or morecontingent plans in the model. The treatment plan also may includerandomization via behavioral or mixed strategies.

The treatment plan may include one or more traps, where the medicalcondition is likely to take actions so as to fall into a trap thatcauses the medical condition to go into one or more of the followingstates; a state in which the medical condition may be more easilytreated; a state in which the medical condition is less virulent; or astate in which the medical condition is less contagious; or a state fromwhere the medical condition is less likely to evolve into a harmfulstate.

When solving the model, the system may apply an opponent model in whichthe medical condition is able to look ahead at most a set number ofsteps in the game; and it may create any number of paths for the medicalcondition in which a sequence of steps includes one or more steps withinthe set number that are attractive to the medical condition, and atleast one step beyond the set number that is associated with a state ofthe medical condition that is better for the treater, as patient withthe medical condition, or both.

Possible treatment actions that a treater can take to treat the medicalcondition may include actions to treat the medical condition at anindividual level, a molecular level, or a population level. For example,the treatment actions may include an action to treat the medicalcondition at a molecular level via a de novo drug. As other examples,the treatment actions may include any of the following: prescribing oradministering one or more drugs to a patient having the medicalcondition; performing a surgical procedure on the patient having themedical condition; applying a therapy to the patient; prescribing alifestyle change to the patient; admitting the patient to a treatmentfacility; releasing the patient from the treatment facility; taking oneor more measurements of the patient; or taking no action.

Optionally, the description of the sequential game may include one ormore possible nature actions that a nature player may take relating totreatment of the medical condition, wherein each possible nature actionis associated with a probability. If so, the model may representoccurrence of the possible treatment actions, the possible medicalcondition actions and the possible nature actions in one or moresequences.

Optionally, when solving the model, the system may use one or more gametheory solution concepts and one or more utilities that are associatedwith outcomes, intermediate states, or transitions in sequential gameplay. A utility may include a function of a measurement of one or moreof the following: health of a patient with the medical condition; a costto the patient, the treater, or a third party payor; or a current stateof the medical condition. For example, the system may implement anopponent modeling technique; or implement an opponent exploitationtechnique. In addition or alternatively, the system may exploit anopponent as an opponent model is improved over time based on experience.In addition or alternatively, the system may compute a best-responsestrategy to an opponent model using stochastic programming techniques.These may include use of sample trajectory-based optimization and/or apolicy gradient algorithm.

The system may present at least a portion of the treatment plan to auser via a user interface, and it may use information learned whileusing results of the model to develop an updated model.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an example representation of an incomplete-information gameaccording to an embodiment.

FIG. 2 is a flow diagram that illustrates an example process for medicaltreatment planning using a sequential game.

FIG. 3 is a block diagram illustrating various elements of an example ofa computing device.

FIG. 4 is a tree diagram illustrating an example of game play in thecontext of treating a disease such as HIV.

DETAILED DESCRIPTION

As used in this document, the singular forms “a,” “an,” and “the”include plural references unless the context clearly dictates otherwise.Unless defined otherwise, all technical and scientific terms used hereinhave the same meanings as commonly understood by one of ordinary skillin the art. As used in this document, the term “comprising” means“including, but not limited to.”

In this document, the terms “computing device” and “processor” refer toa computer or other machine that performs one or more operationsaccording to one or more programming instructions. Examples of computingdevices include desktop computers, laptop computer, electronic tablets,ultrabooks, smart phones, smart televisions, and similar electronicdevices having processing and user interface capability. Variouselements of an example of a computing device or processor are describedbelow in reference to FIG. 3.

The embodiments described below include methods and systems that usesequential game models and algorithms for solving them, for drug design,dosage regimen planning and/or medical treatment planning. The modelsmay be used to model how various courses of treatment may affect amedical condition over time, as well as how external factors can alter acourse of treatment's ability to treat a medical condition.

In an embodiment, a medical condition may be a physical, physiological,mental or psychological condition of a patient. For example, a medicalcondition may refer to an illness or disease such as, for example, HIV,cancer, influenza, malaria, diabetes or schizophrenia. In an embodiment,a medical condition may refer to malnutrition, obesity and/or the like.A medical condition may be referred to in this document as a disease orillness.

In an embodiment, a treater may be a person who treats at least aportion of a medical condition. Example treaters may include, withoutlimitation, a nurse, a doctor or other healthcare professional. In anembodiment, a treater may be a patient if the patient self-treats, or apharmaceutical company if the company is developing drugs or drugregimens for administration to one or more patients. In the context ofthis document, a treater may be an actual human player who participatesin a game and takes the actions of a treater, or it may be a virtualtreater represented by the actions of a treater as automaticallyimplemented by a system that is playing the game.

For example, consider an example of a patient who is being treated by atreater for a medical condition. The treater's task may be to treat thepatient over time. The treatments may include prescribing various drugregimens or combinations of drugs; performing a surgical procedure;applying physical, mental or other therapy to the patient; prescribing alifestyle change such as an exercise plan or dietary change orrestriction, and so on.

In an embodiment, the state of the disease may change over time. Forexample, if the patient is diagnosed with HIV, it is known that the HIVvirus may mutate over time. How the disease changes over time can beaffected by the course of treatment.

To address issues such as this, this document describes modeling acourse of treatment over time as a game. The particular model used mayvary, and there may be many alternative ways of modeling the state ofthe disease, the space of possible treatments, and how the treatmentsaffect the disease and the patient over time.

The game may include sequential and/or simultaneous moves, and it mayhave complete or incomplete information about the disease. Incompleteinformation can represent the treater's lack of exact knowledge aboutwhat moves the disease has taken so far in the game. For example, thetreater may not know all the mutations that an HIV virus has taken.Similarly, there may be (e.g., for the disease) incomplete informationabout what treatment actions have been taken. Furthermore, there can bevarious forms of stochasticity in the game, as will be described below.Solving the game model would be expected to provide a good treatmentplan. For example, if the course of treatment includes various drugdesigns or regimens, then if the course of treatment “wins” over thedisease in the game, or optionally even if the course of treatmentscores a partial victory (such as by improving the quality of life ofthe patient and/or extending the life of the patient), then the courseof treatment may be considered to be a good one. Any now or hereafterknown algorithm for solving a game model may be used. Examples includethose that have been developed for solving various forms of poker in theincomplete-information game case. For instance, a leading approach forfinding strong strategies for poker is to run an abstraction algorithmfirst (for information abstraction, action abstraction, phaseabstraction, time abstraction, and/or other abstraction) to construct agame that is strategically equivalent or nearly equivalent but smallerand thus easier to solve, and then running an equilibrium-findingalgorithm (such as counterfactual regret minimization, excessive gaptechnique, fictitious play, etc.) on the abstracted game to findstrategies for the players according to some solution concept. Examplesalso include algorithms that have been developed for solving variousforms of complete-information games such as chess, checkers, Go, andgeneral game playing (for which there is an annual competition forcomputers). These include minmax search, αβ-pruning, proof numbersearch, conspiracy numbers, transposition tables, endgame tables (e.g.,via dynamic programming-like approaches), expectimax search, Monte Carlotree search (e.g., Upper Confidence bounds applied to Trees (UCT) andvariations thereof), and so on. In some embodiments, the system may beagnostic to the solving methodology.

Abstraction techniques may be especially useful if the present approachis used in large games (e.g., large state or action spaces). Forexample, in de novo drug design, the space of possible molecules ishuge—even if molecules that are highly unlikely to work are removed fromconsideration up front, so action abstraction (i.e., bundling multipleactions into one representative one) can be useful for scalability.

One idea is that in order to solve game models in the context of systemsdescribed in this document, one can use incremental abstraction wherethe steps of abstraction and solving of the abstracted game are iterated(interleaved) multiple times. This way the solution of the game caninform where in the game model finer-grained abstraction is needed(and/or is affordable from a scalability perspective) and where theabstraction can be and/or needs to be coarsened. Not that abstractionand iterated abstraction can he used both when solving the game modelusing a game-theoretic solution concept and when solving the game modelusing opponent modeling/opponent exploitation.

One abstraction-related idea here is to make the actions in the gamemodel higher-level concepts so as to reduce the size of the game modeland make it more tractable (e.g., faster and/or less demanding inmemory) to solve. In one embodiment, these higher-level actions can bebehaviors, e.g., short or simple plans or plan snippets. In anotherembodiment, for example for drug design, the actions can include addingor subtracting bigger pieces than individual atoms (such as chains,cycles, groups, or even bigger pieces) to and from the dam molecule.

During the sequential and/or simultaneous moves of game play, thetreater may implement any suitable treatment action, such as prescribingand/or administering one or more drugs or drug cocktails, exercise ortherapy regimens or other treatment actions for the patient; admittingthe patient to a treatment facility; releasing the patient from atreatment facility; measuring one or more aspects of the patient (suchas pulse, heart rate, cholesterol level, structure and quantity ofviruses and bacteria, and so on); or choosing to take no action for aperiod of time and/or until the disease takes its next action.

The treater can also use a strategy that tries to trap the opponent(adversary, e.g., a disease) with one or more traps. The goal is thatdisease moves (e.g., evolves) over time into a trap state that seemspromising to the disease, but instead is a state such that the treatercan successfully attack the disease. For instance, the treater can usean opponent model to model how the opponent is likely to playirrationally into a trap. For example, consider an opponent (disease)that cannot look ahead more than a set number (k) or steps in the gametree. In that setting, the treater can make the disease go down a patentthat is eventually good for the treater by using a treatment strategysuch that the early parts within the lookahead horizon (i.e., the setnumber k of steps) of the trap paths (e.g., all of them, most of them,or a high-probability set of them) are desirable to the disease (e.g.,they have high utilities, discussed below, for the disease), but suchthat the later parts beyond the lookahead horizon of those paths (all,most, or a high-probability set of them) beyond the set number of stepsare good for the treater and/or the patient. The opponent model can alsoinclude different lookahead capability for the opponent down differentpaths of the game tree, that is, the opponent may be able to look aheaddeeper on some paths than others. Note that when the game is played downa path, the treater may take actions that temporarily make the patientworse in order to achieve a better end result for the treater and/orpatient. For example, in the context of HIV, the treater may use drugcocktails in the early parts of the game that are likely to cause thevirus to mutate in directions that are not immediately better for thepatient, but which can be tackled effectively in later parts of thepaths with other drugs. Note that this is in stark contrast with mustcurrent practices where treatments are selected in order to myopicallyimprove the patient's health. When the treater's possible actionsinclude de novo (drug) molecules, the plan (treater's strategy) that thesystem outputs may include de novo drugs (which may be referred to as“trap drugs”) whose main role is to trap the disease rather thanimmediately making the patient better.

For example, if a medical condition is a certain disease, then a trapmay be a state in which the disease can be destroyed, become lesspowerful, become less virulent, become less contagious, or cannot (or isless likely to) evolve in a malicious way. As an example, a trap may bea first drug or a treatment regimen that makes one or more aspects ofthe patient's condition worsen for a short period of time, or whichallows the disease to flourish for a short period of time, butwhich—after implementation—enables the treater to select a second drugor treatment regimen that will significantly improve the patient'shealth.

In an embodiment, the system may be applied at one or more levels tobattle a disease. For example, the system may be applied at anindividual level, at a molecular level and/or at a population level.This document will describe various embodiments within each level, butthe examples are not to be considered restrictive.

Applying the system at an individual level may involve the treatment ofan individual patient. For instance, consider the treatment of a patientwho is infected with the HIV virus. At any point in the game, thetreater may take one or more actions such as: (i) applying one or moretreatments (such as which drug or drug cocktail mixture to use, when tobring the patient to hospital, when to release the patient, and/or thelike), (ii) taking one or more measurements (such as measurementsassociated with a blood test or external observations such as weight andpulse), (iii) performing other actions; and/or (iv) taking no action. Ateach point in the game, the disease (e.g., HIV) may take one or moreactions such as evolving the disease within the patient (e.g., evolvingthe pool of different forms of HIV viruses existing in the patient),making the patient worse or better in various ways, or taking no action.

When solving a game, the system may apply a game-solving algorithm to amodel with one or more utilities. A utility is a representation of aplayer's welfare at any point in the game. In some embodiments,utilities may be associated with one or more outcomes, intermediatestates, and/or transitions in the game. For example, if utilities areassociated with outcomes, each of several outcomes may be assigned anumeric value of utility, with higher numbers representing morepreferred outcomes. Utilities may be based on the patient's actualand/or projected health (optionally including side effects); anassessment of a state of the disease such as the disease' level ofvirulence, level of contagiousness, or how easily attackable the diseaseis in its current state (e.g., by a drug or drug regimen); how likelythe disease is to evolve from the current state to malicious states;and/or the current and/or expected future cost of treatment and/or othercosts to the treater, patient or third party payor. The treater'sutility for any outcome, intermediate state or transition in the gamemay also include, for example, as measurement of utility that considersthe patient mortality rate and/or financial costs of treatmentassociated with the item.

In an embodiment, one or more actions may be associated with one or moreparameters. A parameter may be a feature or characteristic of an action.Example parameters may include, without limitation, a duration of anaction, a type of regimen to which the action pertains such as forexample, a dosage regimen, an exercise regimen, a dietary regimen orguidelines and/or the like.

The output of the model may be a plan for treatment. In game theory, aplan is often called a “strategy.” In some scenarios, the output mayinclude one or more contingent plans. Contingent plans are sometimesknown as online control policies, in that they prescribe differentactions based on the results of observations. Unlike single-shot games,we consider sequential games where a player's strategy may include asequence of actions. Even more generally here, a player's strategy canbe a contingent plan, that is, his probability distribution over nextactions to take may depend on his observations about how the game hasplayed out so far (and any private information and beliefs he may have).A full contingent plan may be generated in advance before treatmentbegins, or the planning may be done incrementally by interleavingplanning and execution.

One way of tackling the game model is opponent exploitation (e.g., wherethe opponent is the disease) beyond what any equilibrium strategy canaccomplish. It was folk wisdom in game theory that one cannot exploit anopponent safely (that is, without exposing oneself to exploitation, orrisking doing worse than an equilibrium strategy in expectation) beyondwhat the best equilibrium strategy can accomplish. However, in thesystem described in this document, one accepts that safe opponentexploitation is possible.

A plan can serve as a treatment plan that is implemented in a patient,or it can serve as a recommendation for a party such as a doctor whomakes the final decision. The system may, in some embodiments, producetwo or more alternative solutions for the medical professional and/orpatient to select among. For example, the system may propose a lessradical treatment plan with a light drug cocktail that may have ashorter life expectancy but a higher quality of life than analternative, more aggressive drug cocktail with a longer lifeexpectancy.

Applying the system at a molecular level may involve developing atreatment for generic patients or for a limited set of prototypicalpatients. The actions of the treater at any point in the game mayinclude what drug or drug cocktail to use, an amount of the drug or drugcocktail that should be used and/or the like. The actions may includechoosing a cocktail of existing drugs. The actions can also include denovo drug designs, for example, the actions can include new molecules.This is one way how the present invention can be used for drug designsince the output of the system will include a plan that may include oneor more de novo drugs. The actions can also include conducting tests onthe patient and/or the virus population in the patient, and/or the like.

The actions of the disease at any point in the model may include themost likely mutations and the most likely mutating locations or bindingsites. A model may be used to predict how well one or more of thetreater's actions addresses one or more of the disease's actions. Forexample, if a treater's action is to prescribe a drug cocktail, a modelmay he used to predict how well each of the drugs in the cocktail wouldbind to each mutation at each binding site. The output of the model maybe a plan of treatment over as period of time that may include one ormore contingencies.

For example, applying the system at a population level may involvedeveloping or identifying, an appropriate course of treatment for apopulation of patients or potential patients. For example, applying thesystem at a population level may involve developing a course oftreatment for an influenza epidemic. The actions of the disease at anypoint in the game model may include spread of the influenza strands(possibly including mutation) to different parts of the population. Thisis unlike the current way of treating influenza in the United States,where a single vaccine is developed per year for the entire flu season,and the choice is merely whether or not to vaccinate a person. At anypoint in the game model, the actions the treater may take may include,without limitation, determining a drug or drug cocktail to use in one ormore parts of a population, or determining whether one or more parts ofthe population should be hospitalized, quarantined, and/or the like. Thetreater's possible actions may also include the selection from apotentially unrestricted space of de novo molecules (so that drug designcan be incorporated within the population-level game). The treater'sactions may also include conducting tests on patients from varioussubsets of the population, and or testing one or more aspects of thevirus within the patients. The treater's utility could be based on, forexample, a mortality rate or one or more costs such as hospitalizationcosts.

In an embodiment, the output of the model may be a treatment plan over aperiod of time. The plan may detail how the treater changes thetreatment or testing over time in each portion or segment of thepopulation. The plan also may have contingencies at various points,where the next step in the plan after that point will, depend on certainparameters, such as test results.

While the example described above applies the system at the populationlevel, it is possible that various additional levels can be used. Forexample, when battling a disease at the population level, the systemcould also oppose the disease at the molecular level. Such an embodimentcould be helpful if the virus is new so there is little experience inhow it behaves in the population.

FIG. 1 illustrates an example representation of anincomplete-information game according to an embodiment, and thediscussion below will describe how a game may be solved. As illustratedby FIG. 1, each node 100 a-N represents the player whose turn it is tomove. FIG. 1 illustrates a two-player, zero sum game. However it isunderstood that additional players may participate in the game, and thatthe game may not be a zero sum game.

In an embodiment, uncertainty in the game may be represented in FIG. 1by a player referred to as “Nature.” A Nature player 102 may make movesbased on fixed probabilities rather than strategic moves. For example,as illustrated by FIG. 1, there may be a 30% chance that Nature player102 makes move 108, a 50% chance that Nature player makes move 110 and a20% chance that Nature player makes move 112.

Incomplete information is represented in FIG. 1 by information sets 104,106. In an embodiment, an information set is a collection of one or morenodes in a game tree such that the player whose turn it is to move atthe information set does not know which node of the information set isthe actual game state at that point.

The game model may also include taking actions over time, where there isno pre-specified order in which the players are supposed to move. Theactions of the players may also include doing some action for aspecified period of time or at a specified point in time. For example,the treater's actions may include applying a specific drug cocktailstarting at a certain time and ending at another time.

Other representations of the game can also be used. Other general gamerepresentations can be used, such as, without limitation, the normalform (aka strategic form aka matrix form), sequence form, graphicalgame, and action-graph game. Other compact or application-specific gamerepresentations can also be used.

In an embodiment, solving a game model, such as that illustrated in FIG.1, may result in a treatment plan. In an embodiment, solving a gamemodel may involve one or more solution concepts. A solution concept maybe one or more rules for predicting how a game will be played.

Example types of game theory solution concepts include, withoutlimitation. Nash equilibrium, subgame perfect equilibrium, perfectBayesian equilibrium, sequential equilibrium, trembling-hand perfectequilibrium, extensive-form perfect equilibrium, extensive-form properequilibrium, admissible strategies, normal form perfect equilibrium,quasi-perfect equilibrium, normal form proper equilibrium, andcorrelated equilibrium. Approximate versions of any of the foregoing maybe used within the scope of this disclosure.

In an embodiment, if a game model has more than two players, thensolution concepts having to do with coalitions, such as, for example,strong Nash equilibrium, coalition-proof Nash equilibrium, strongcorrelated equilibrium and other variations may be used.

In an embodiment, if a game is modeled as a non-cooperative game, thesolution concept may define which strategy profile and beliefs (e.g.,probability distributions over nodes within each information set)constitute solutions to the game. In other words, a solution concept mayidentify one or more strategy profiles that are reasonable solutions for“rational” players to use.

In an embodiment, a strategy profile may include one strategy for eachplayer. Each player's pure strategy may be a contingent plan thatselects an action (or deliberate inaction) based on what has transpiredin the game so far, such as for example, the path of play by all theplayers and Nature and the time that has elapsed. Typically a playercannot condition her action on actions from the past that she has notobserved. Therefore, typically each player has to decide his/her actionbased on the information set. So, a pure strategy for a player mayprescribe one action per information set, for those information setswhere it is that player's turn to move, although other variations arepossible.

In an embodiment, a player's mixed strategy may be the player'sprobability distribution over the player's pure strategies. In anembodiment, behavioral strategies that assign probability distributionsto actions at each information set may be used. In these ways, a playermay act with randomization.

The information sets in a model may include data that describes possibleactions of a disease and/or treater. Such data can be provided by anysuitable source, such as medical and/or scientific literature, frominput by a treater and/or patient, from results databases, diseaseevolution models, clinical trial results, and so on.

In some embodiments, the Nature player may play a role in the game byintroducing stochasticity. For example, nature can introducestochasticity in the patient's state and in test results (probability ofeach reading conditional on the true state). The moves that nature canmake (and the probability distribution over those moves) for points inthe game where it is nature's turn to move can be generated fromscientific papers or databases of results, disease evolution models andsimulations, tests on humans or animals, past experience about thedisease on a particular patient of segment of patients, experiencegathered about the disease while using the system, active learning,available data on the probabilistic errors that given tests have, and soon. The learning and information extraction and/or information fusioncan he done using machine learning techniques or manual approaches.

In some embodiments, the moves that the treater can make, as containedin any given information set for the treater, may include standardtreatments for the condition, and/or new potential treatments that thetreater wants the game-solving system to consider as possible parts ofthe treatment plan that the system outputs. The standard treatments cancome from any suitable source such as guidelines, common practice,scientific papers or databases, and so on.

In an embodiment, in some situations, a game-theoretic approach may betoo conservative. For example, a game-theoretic approach may be tooconservative in settings where it is known or believed that that theopponent will not behave in the worst possible way. In these situations,opponent modeling and opponent exploitation may be used.

In an embodiment, an opponent model may predict what an opponent woulddo in various information sets. An opponent model may be generated fromresult sets, disease evolution models and simulations, experiments,trials that test treatments and/or contingency plans for treatment, pastexperience about an opponent, experience learned about an opponent whileusing the system, active learning and/or the like. In the example ofHIV, at opponent model may be generated based on data describing whichantivirals tend to cause specific mutations in reverse transcriptase,protease or integrase (e.g., in the form of a probability table), and/ordata on efficacy of other antivirals against such mutants. Thus, thesystem may also develop and/or update the opponent model based oninformation learned while playing the game (e.g., executing thetreater's plan together with the opponent's strategy and nature'sstrategy—in the physical world or in simulation). The opponent model canbe generated automatically using a host of different possible algorithmsand/or using manual approaches.

There are many was of using learning (automated and/or manual) toconstruct and/or refine the opponent model and the model about thenature player and the game itself (e.g., the game's structure and theutilities in the game). For example, on the simple end, if one observesthe opponent or nature taking an action that is not in the model, onecan add that action to the model. Also on the simple end, as oneobserves an action by the opponent or by nature, one can update thecounter for that action for that state of the game of course, one mayhave to aggregate this information across states that one cannotdistinguish among, for example, due to incomplete information). Then,one can use the counter-based action frequencies at that state as themodel of how the opponent or nature is likely to behave at that point ofthe game. There are many further possible improvements to the learning.For example, one can use machine learning techniques to conductgeneralization of the learnings across states. As another example, onecan assume first that the opponent behaves rationally according to gametheory, and then start adjusting the opponent model toward observedbehavior of the opponent as we get more knowledge about the opponent'sactual behavior based on observations or from new scientific knowledgefrom books, papers, or databases.

An opponent model can also combine frequentist approaches (such as theones described in the previous paragraph) with assumptions about theopponent's ability to conduct only limited lookahead (such as theapproaches described earlier in this document).

In an embodiment, a player may start by playing game-theoretically andthen adjust play toward exploiting an opponent as a more robust opponentmodel is developed over time based on experience. An example algorithmthat may be applied to this approach is described in, for example, “Gametheory-based opponent modeling in large imperfect-information games,”Sam Ganzfried and Tuomas Sandholm, International Conference onAutonomous Agents and Multi-Agent Systems, AAMAS, 2011.

In an embodiment, the system may identify an e-safe best response, orapproximation thereof. An e-safe best response is one that will do atmost a predefined e worse (in terms of utility) than a game-theoreticstrategy. This strategy may exploit a model of the opponent maximally,subject to the constraint that even against the worst-case opponent, itwill do at most e worse than a game-theoretic strategy. Typically,although not necessarily, the e in the e-safe best response is measuredin terms of an expectation over all the players' (including nature's)possibly randomized strategies. One other way to measure e is to take anexpectation over some (or none) of the players, considering the worstcase of the other players' strategies.

In an embodiment, the system computes an exploitative (e.g., in thesense of exploiting the opponent more than any game-theoreticequilibrium strategy can) strategy that is safe, that is, no worse thana game-theoretically optimal strategy. This is possible if the opponentmakes mistakes, i.e., plays worse than a fully rational game-theoreticplayer would. In zero-sum game settings, the utility (measuredtypically, but not necessarily, as an expectation over the players' andnature's randomized strategies) that the disease foregoes by making amistake is a gift to us as the treater. In this context, the sum of thegifts that the opponent has given the player (minus any gifts that theopponent may have received from the other player) may be represented bye. Then, the system can use an e-safe best response and still beabsolutely safe. However, one may wish to separate out the gifts (whichare due to the opponent's mistakes) from luck (i.e., lucky draws of therandomizations). One aspect of this fully safe opponent exploitationtechnique is that it does not require one to be able to compute the sumof the gifts exactly: a lower bound suffices to guarantee safety. Thisalso means that it is possible to use the technique even if one is notsure that one's game model is exactly accurate.

In an embodiment, a set of strategies may be computed. It may then bedetermined which strategy performs best against an opponent based onsimulated or real world learning. In an embodiment, no-regret learningalgorithms may be used to perform well not only in the end but alsothroughout the learning process.

In an embodiment, a best-response strategy to an opponent model (and themodel of nature if the nature player is part of the game), or anapproximation thereof, may be used. A best-response strategy may be onethat produces highest utility for a player, given the other players'(including nature's) strategies (typically measured in expectation, butone can also measure it in terms of worst case or other measures, forexample, one can make worst-case assumptions about some of the otherplayers strategies (possibly only at some points of the game, e.g., oneswhere we do not have much knowledge about the opponent's behavior)and/or nature's actions (possibly only at some points of the game, e.g.,ones where we do not have much knowledge about nature's actionprobabilities)). To find one or more such solutions or approximationsthereof, techniques from stochastic programming (sometimes also calleddynamic optimization and sometimes also called stochastic optimization)may be leveraged. The stochastic programming techniques applicable hereinclude both exact and approximate approaches. The stochasticprogramming techniques applicable here include both offline stochasticprogramming techniques where the plan is generated up front beforestarting to execute it, and online stochastic programming approacheswhere the plan is generated in pieces (typically one action at atime)—interleaving planning and execution. Example algorithms that maybe applied in this setting may include, without limitation, sampletrajectory-based optimization techniques and policy gradient algorithms.

In sample trajectory-based algorithms, possible paths of the future (inthe game model in this context) are drawn. These are called sampletrajectories or scenarios. Then a plan is computed that does well(typically in the sense of utilities weighted by probabilities, butother measures such as more risk averse ones can also be used) acrossmany of those sample trajectories. There are various algorithms fordoing this computation. Some of them consider all of the scenariossimultaneously. Others make a tentative plan for each scenarioseparately and then use various methods for aggregating those plans intoan overall plan.

In policy gradient methods, the plan is parameterized by a (typicallyrelatively small) number of parameters that control what the plan does.Thereby the computation to determine a plan is simplified to acomputation that tries to optimize (approximately or exactly) theparameters.

In another embodiment, one can use a hybrid of solving the game modelusing a game-theoretic solution concept and solving it using opponentmodeling/exploitation. For instance, one can assume that the opponentplays according to the opponent model in points of the game where onehas a significant amount of statistical information about theprobability distribution over the actions that the opponent takes atthat point, and assume that the opponent plays game theoretically atother points of the game.

In another embodiment, if the strategy that is computed for the treateris randomized, the amount of randomization is decreased before thestrategy is output or before it goes into implementation. The reductioncan be done, for example, by rounding the probabilities that are lessthan some threshold to zero, and scaling up the other probabilities atthat information set up accordingly so they sum up to one. As an extremecase, one can simply use the highest probability action/strategy. Suchreductions of randomization have been found helpful even in some pokerstrategies, but in the context of this document, reduction ofrandomization may be particularly useful because the opponent is notrational or deliberative. So, there is less need to worry that one'sactions signal too much about one's private information to the opponent,which is typically the main motivation for randomization in games.

FIG. 2 illustrates an example of a process flow that a medical treatmentor drug design planning system may implement. To develop a course oftreatment for a medical condition, which may include the design and/oradministration of one or more drugs or drug regimens, exercise regimens,other therapies and the like, a computer-implemented system may presenta user interface that implements a model as a sequential game thatreceives input from a treater. Alternatively, the system may implementthe game automatically with a virtual treater and produce a recommendedcourse of treatment or drug, design as an output.

The system may identify a description of a game by receiving informationcorresponding to the medical condition (step 201). The system mayreceive this information via a user interface or communications on froma doctor, patient, researcher or other individual or system havinginformation about the medical condition. The system also may receive,via the user interface or a communications port, one or more possibletreatment actions for the medical condition (step 202). The possibletreatment actions also may include a sequence for the actions, and oneor more parameters for each action. The system will also receive one ormore possible medical condition actions that a disease or other medicalcondition may take (step 203) in response to a treatment action or otherinput or influence.

After receiving the treatment actions and medical condition actions, thesystem may implement programming instructions to develop a first modelfor a sequential game (step 204). The system may do this by accessing adata storage facility, identifying an appropriate model for the courseof treatment and retrieving the identified model. Alternatively, it maybuild a new model. For example, the model may include possible medicalcondition actions such as the medical condition mutating in one or moreways, evolving in one or more ways, and/or the like. Treatment actionsin the model may include applying one or more treatments, taking one ormore measurements, taking no action and/or the like.

In an embodiment, one or more possible actions that a medical conditionand/or a treater may take may be based on one or more considerations. Aconsideration may be a state, status or other condition of a patient, amedical condition, a treatment and/or the like. Example considerationsmay include, without limitation, a current status of a patient's health,a projected health of a patient, how virulent a medical condition is,how contagious a medical condition is, how easily attackable ortreatable a medical condition is in its current state, a cost oftreatment, a projected future cost of treatment, other costs and/or thelike.

The model also may include an information set associated with thetreater, an information set associated with the medical condition, orboth. The information set for either player may include one or moreactions that the player could take based on any suitable parameters suchas medical condition state, previous actions taken, time elapsed, orother parameters.

In an embodiment, the system may select medical condition actions and/ortreater actions based on user input, based on commands received by thesystem from another system, by implementing rules or other functions ofa model, or via automatic or random selections by implementingprogramming instructions and/or the like. The model representsimplementation of the set of actions in accordance with the sequence andthe parameters.

To solve the model (step 205), the system may apply a game-solvingalgorithm to a model with one or more utilities. Any medical conditionaction may be responsive to an immediately-received treatment action(and ones that were prior, but not immediately prior), or vice versa.After each treatment action, the model may include information that thesystem uses to predict an updated status of the medical condition basedon the application of the treatment action to the medical condition).Predicting the updated status may include determining an expectedpatient response to the treatment action. Because the model may includeone or more contingent plans, any medical condition action and/ortreatment action may be depending on one or more parameters, such as theprevious action taken by the other player and/or parameters from aninformation set. Also, as described above, when solving the model atreatment strategy may include the setting of a trap that may lure themedical condition into a state in which the condition may be more easilytreated, less virulent, less contagious, or otherwise in a morepreferable state for the patient and/or others. Suitable methods ofsolving may include opponent modeling techniques, opponent exploitationtechniques, or techniques for solving according to game theory solutionconcepts such as those described above.

The system may present an output as a treatment plan (step 206) (i.e.,the treater's strategy, which may be a contingent plan) that includesthe set of medical condition actions and treatment actions. Thetreatment plan may include the actions, the traps, dosage regimenimplementation, medical treatment planning, and/or other informationthat led to the result.

Although the description of game play described above used the exampleof application of treatment actions on an individual level, as notedabove the actions and game play also may occur at a molecular level(such as would be the case where the “treatment plan” includes a designfor a new drug), or at a population level.

The goal of the game may be to identify what treatment plan (i.e.,strategy) is expected to result in a state in which the medicalcondition may be more easily treated, a state in which the medicalcondition is less virulent, or a state in which the medical condition isless contagious. The system may prepare and output a report of thesetreatment actions or the complete treatment plan for implementation by atreater in real life, outside of the game.

FIG. 3 depicts a block diagram of an example of internal hardware thatmay be used to contain or implement program instructions, such as theprocess steps discussed above, according to embodiments. A bus 300serves as an information highway interconnecting the other illustratedcomponents of the hardware. CPU 305 represents one or more processors ofthe system, performing calculations and logic operations required toexecute a program. CPU 305, alone or in conjunction with one or more ofthe other elements disclosed in FIG. 3, is an example of a processingdevice, computing device or processor as such terms are used within thisdisclosure. Read only memory (ROM) 310 and random access memory (RAM)315 constitute examples of memory devices or processor-readable storagemedia.

A controller 320 interfaces with one or more optional tangible,computer-readable memory devices 325 to the system bus 300. These memorydevices 325 may include, for example, an external or internal diskdrive, a hard drive, flash memory, a USB drive or the like. As indicatedpreviously, these various drives and controllers are optional devices.

Program instructions, software or interactive modules for providing theinterface and performing any querying or analysis associated with one ormore data sets may be stored in the ROM 310 and/or the RAM 315.Optionally, the program instructions may be stored on a tangiblecomputer readable medium such as a compact disk, a digital disk, flashmemory, a memory card, a USB drive, an optical disc storage medium, suchas a Blu-ray™ disc, and/or other recording medium.

An optional display interface 340 may permit information from the bus300 to be displayed on the display 345 in audio, visual, graphic oralphanumeric format. Communication with external devices, such as aprinting device, may occur using various communication ports 350. Acommunication port 350 may be attached to as communications network,such as the Internet or an intranet.

The hardware may also include an interface 355 which allows for receiptof data from input devices such as a keyboard 360 or other input device365 such as a mouse, a joystick, a touch screen, a remote control, apointing device, a video input device and/or an audio input device.

FIG. 4 illustrates an example of sequential game play, using a tree-likestructure of moves. In this example, the system may be used to identifyand/or develop a drug and/or drug cocktail for a patient or set ofpatients who are HIV-positive. The actions of the disease (HIV) at anypoint in the game model may include, for example, mutation, and themodel may include the most likely mutations in the most likely mutatinglocations (binding sites) of HIV-1 Protease. The disease may take actionby selecting and implementing a mutation, or by selecting no mutation.The system may select any of these mutations as an action for thedisease during game play.

The actions of the treater at any point in the game model may includeselecting a pharmaceutical regimen to prescribe to the patient. Theselected pharmaceuticals may be existing drugs and/or drugs designedand/or suggested by the system. Any number of pharmaceutical regimensmay be identified and/or selected. The actions of the treater also mayinclude conducting one or more tests on the patient and/or the viruspopulation in the patient. At any decision point, the treater may beable to choose from one of a set of actions, each of which will lead toone or more possible actions by the disease, in a tree-like arrangementsuch as that illustrated in FIG. 4.

For example, when presented with an HIV-positive patient, the system maygive the treater the choice of selecting a first course of treatment401-a corresponding to a first drug cocktail, or a second course oftreatment 402-b corresponding to a different drug cocktail. Each actionmay result in a different result state 403, 404 for the disease. Next,the disease may make a move. For example, starting from result state403, the disease may experience a mutation 405-c that worsens thecondition of the patient, or it may remain static or go into remission406-d so that the patient's condition improves. Based on the disease'saction, the treater may than select from a set of available next actions407-e, 408-f in the information set, and so on.

The system may apply a model to predict how well each of the treateractions (i.e., drugs) may address the disease actions (e.g., by bindingto each mutation at each site). Any now or hereafter known model may beused, such as those disclosed by: (1) Kamichetty, “StructuredProbabilistic Models of Proteins across Spatial and Fitness Landscapesat pp. 121-127 (Carnegie Mellon University, March 2011); or (2)International Patent Application No. PCT, US2012/026966, filed Feb. 28,2012, titled “Using game theory in identifying compounds that bind totargets.” The utilities of the players may, for example, be associatedwith predicted binding energies at the sites. For example, it may be thesum across the sites. In some embodiments, the treater's utility may bethe that sum but with a negative sign because the treater may want tominimize that sum. The output can be a plan over time, that is, how thetreater changes the treatment over time. The plan can include comprisesas set of possible treatment actions and possible medical conditionactions, at least some of which are included in information sets alongwith a probability distribution for each such action. The output plancan also include one or more contingent plans where the rest of the plandepends on the results of tests.

In some embodiments, game play may occur over multiple paths of the sametree. This may occur, for example, if multiple strands of a virusexperience different mutations. Thus, in the context of FIG. 4, gameplay may occur, for example, along each of the two primary trunks of thetree. The state of the disease at any given point could include allconfigurations in which the virus is currently modeled to exist in thehost.

Systems such as those discussed in this documents may provide variousbenefits. For example, the algorithms can solve game models better thanhumans can (and in many cases optimally), so there is a potential togenerate better treatment plans than doctors and policy committeesgenerate today. In fact, present day manual medical treatment planningis rather ad hoc and unsophisticated from the perspective of the stateof the art in game solving algorithms—in particular in the ability togenerate high-quality sequential plans. In addition, because theplanning is automated, it may be dramatically faster and may requirefewer human resources. This means that custom plans can be generated formore specific population segments and eve for individual patients. Thespeed also may enable a user of the system to conduct what-if analyses(sensitivity analysis) to test how the system-generated plan wouldchange under different assumptions about Nature's moves (impact oftreatments on patient, accuracy of tests, etc.). This has the potentialto also guide where future medical research should be conducted: themost valuable knowledge to generate is the knowledge that will impactthe treatment plans.

The description so far has discussed a game model where the disease mayhave a complex (e.g., high-dimensional) state, but whenever it is thedisease's turn to move, it can select only one move. In other words, thedisease proceeds down a single path. However, embodiments of the systemalso include variations where the disease may have a simpler state, butit may be able to proceed down any of multiple paths of a decision tree.This is because the mutation or other progress of a disease can proceeddown multiple paths simultaneously, e.g., a human can have multiplestrands of a virus simultaneously. The disease has no strategic plan forevolution, but the treater has an advantage of looking ahead anddeveloping contingency plans based on various mutations or evolutionarysteps of the disease. The system described in this document may allowthe treater to assess multiple courses of action by simultaneouslyplaying multiple paths within the game. In addition, the treater mayplace one or more traps for the disease. Again, a model can be used todetermine whether the traps are likely to lure the disease.

Although there have been prior attempts to use game theory for drugdesign, those game models have been single-shot games, that is, eachplayer (typically in parallel) chooses one action from a set of actions.The system described in this document uses sequential game models, andit introduces the options of contingent plans, traps, and opponentexploitation, thus providing many more options for the path of play andfor game solving than a single-shot game (or a single-shot game analysisof endpoints of a simulation). Also, in the sequential context, thesystem may capture and predict the effect of information-gatheringactions (such as measuring various parameters of the disease or thepatient) and employ game-theoretic screening devices. No prior systemhas employed a model where as treater is an actual player—and treatmentactions that the treater may take are used—in the game.

The above-disclosed features and functions, as well as alternatives, maybe combined into many other different systems or applications. Variouspresently unforeseen or unanticipated alternatives, modifications,variations or improvements may be made by those skilled in the art, eachof which is also intended to be encompassed by the disclosedembodiments.

What is claimed is:
 1. A system for developing a course of treatment fora medical condition, comprising: one or more processors; and anon-transitory, computer-readable memory comprising one or moreprogramming instructions that, when executed, cause one or more of theprocessors to: identify a description of a sequential game, wherein thegame is associated with treatment of a medical condition, wherein thedescription comprises: one or more possible treatment actions that atreater can take to treat the medical condition, and one or morepossible medical condition actions that the medical condition can take;develop a model for the sequential game, wherein the model representsimplementation of the possible treatment actions and the possiblemedical condition actions in one or more sequences; and solve the modelto generate a treatment plan for the medical condition, wherein thetreatment plan comprises a set of possible treatment actions.
 2. Thesystem of claim 1, wherein the one or more programming instructionsthat, when executed, cause the one or more processors to solve a modelcomprise one or more programming instructions that, when executed, causethe one or more processors to generate one or more contingent plans inthe model.
 3. The system of claim 1, wherein the plan comprisesrandomization via behavioral or mixed strategies.
 4. The system of claim1, wherein the one or more programming instructions that, when executed,cause the one or more processors to solve the model to generate atreatment plan for the medical condition comprise one or moreprogramming instructions that, when executed, cause the one or moreprocessors to: generate a treatment plan comprising one or more traps,where the medical condition is likely to take actions so as to fall intoa trap that causes the medical condition to go into one or more of thefollowing: a state in which the medical condition may be more easilytreated; a state in which the medical condition is less virulent; or astate in which the medical condition is less contagious; or a state fromwhere the medical condition is less likely to evolve into a harmfulstate.
 5. The system of claim 1, wherein the one or more programminginstructions that, when executed, cause the one or more processors tosolve the model comprise one or more programming instructions that, whenexecuted, cause the one or more processors to: apply an opponent modelin which the medical condition is able to look ahead at most a setnumber of steps in the game; and create a path for the medical conditionin which a sequence of steps includes one or more steps within the setnumber that are attractive to the medical condition, and at least onestep beyond the set number that is associated with a state of themedical condition that is better for the treater, a patient with themedical condition, or both.
 6. The system of claim 1, wherein the one ormore possible treatment actions that a treater can take to treat themedical condition comprise one or more possible treatment actions that atreater can take to treat the medical condition at an individual level,a molecular level, or a population level.
 7. The system of claim 1,wherein: the one or more possible treatment actions comprise one or morepossible treatment actions to treat the medical condition at a molecularlevel via a de novo drug.
 8. The system of claim 1, wherein the one ormore possible treatment actions that to treater can take to treat themedical condition comprise one or more of the following: prescribing oradministering one or more drugs to a patient having the medicalcondition; performing a surgical procedure on the patient having themedical condition; applying a therapy to the patient; prescribing alifestyle change to the patient; admitting the patient to a treatmentfacility; releasing the patient from the treatment facility; taking oneor more measurements of the patient; or taking no action.
 9. The systemof claim 1, wherein the description of the sequential game furthercomprises one or more possible nature actions that a nature player maytake relating to treatment of the medical condition, wherein eachpossible nature action is associated with a probability; and the modelrepresents occurrence of the possible treatment actions, the possiblemedical condition actions and the possible nature actions in the one ormore sequences.
 10. The system of claim 1, wherein the one or moreprogramming instructions that, when executed, cause the one or moreprocessors to solve the model comprise one or more programminginstructions that, when executed, cause the one or more processors tosolve the model using one or more game theory solution concepts and oneor more utilities that are associated with outcomes, intermediatestates, or transitions in sequential game play.
 11. The system of claim10, wherein the one or more utilities comprise as function of ameasurement of one or more of the following: health of a patient withthe medical condition; as cost to the patient, the treater, or a thirdparty payor; or a current state of the medical condition.
 12. The systemof claim 1, wherein the one or more programming instructions that, whenexecuted, cause the one or more processors to solve the model compriseone or more programming instructions that when executed, cause the oneor more processors to: implement an opponent modeling technique; orimplement an opponent exploitation technique.
 13. The system of claim 1,wherein the one or more programming instructions that, when executed,cause the one or more processors to solve the model comprise one or moreprogramming instructions that, when executed, cause the one or moreprocessors to: exploit an opponent as an opponent model is improved overtime based on experience.
 14. The system of claim 1, wherein thecomputer-readable memory further comprises one or more programminginstructions that, when executed, cause the one or more processors topresent at least a portion of the treatment plan to a user via a userinterface.
 15. The system of claim 1, wherein the computer-readablememory further comprises one or more programming instructions that, whenexecuted, cause the one or more processors to use information learnedwhile using results of the model to develop an updated model.
 16. Thesystem of claim 1, wherein the one or more programming instructionsthat, when executed, cause the one or more processors to solve the modelcomprise one or more programming instructions that, when executed, causethe one or more processors to compute a best-response strategy to anopponent model using stochastic programming.
 17. The system of claim 16,wherein the use of stochastic programming comprises use of one or moreof the following: sample trajectory-based optimization, or a policygradient algorithm.
 18. A method of developing a course of treatment fora medical condition, comprising, by one or more processors: identifyinga description of a sequential game, wherein the game is associated withtreatment of a medical condition, wherein the description comprises: oneor more possible treatment actions that a treater can take to treat themedical condition, and one or more possible medical condition actionsthat the medical condition can take; developing a model for thesequential game, wherein the model represents implementation of thepossible treatment actions and the possible medical condition actions inone or more sequences; and solving the model to generate a treatmentplan for the medical condition, wherein the treatment plan comprises aset of possible treatment actions.
 19. The method of claim 18, whereinsolving the model comprises: implementing an opponent modelingtechnique; or implementing an opponent exploitation technique.
 20. Themethod of claim 1, wherein solving the model comprises using one or moregame theory solution concepts and one or more utilities that areassociated with outcomes, intermediate states, or transitions insequential game play, wherein the one or more utilities comprise afunction of a measurement of one or more of the following: health of apatient with the medical condition; a cost to the patient, the treater,or a third party payor; or a current state of the medical condition.